Philippine Languages Online Corpora: Status, issues, and prospects
نویسندگان
چکیده
This paper presents the work being done so far on the building of online corpus for Philippine languages. As for the status, the Philippine Languages Online Corpora (PLOC) now boasts a 250,000-word written corpus of the eight major languages in the archipelago. Some of the issues confronting the corpus building and future directions for this project are likewise discussed in this paper.
منابع مشابه
Building Online Corpora of Philippine Languages
This paper aims at describing the building of the online corpora on Philippine languages as part of the online repository system called Palito. There are five components of the corpora: the top four major Philippine languages which are Tagalog, Cebuano, Ilocano and Hiligaynon and the Filipino Sign Language (FSL). The four languages are composed of 250,000-word written texts each, whereas the FS...
متن کاملPhilippine Language Resources: Trends and Directions
We present the diverse research activities on Philippine languages from all over the country, with focus on the Center for Language Technologies of the College of Computer Studies, De La Salle University, Manila, where majority of the work are conducted. These projects include the formal representation of Philippine languages and the processes involving these languages. Language representation ...
متن کاملAutoCor: A Query Based Automatic Acquisition of Corpora of Closely-related Languages
AutoCor is a method for the automatic acquisition and classification of corpora of documents in closely-related languages. It is an extension and enhancement of CorpusBuilder, a system that automatically builds specific minority language corpora from a closed corpus, since some Tagalog documents retrieved by CorpusBuilder are actually documents in other closely-related Philippine languages. Aut...
متن کاملe-Wika: Philippine Connectivity through Language
In this paper, we present what we have attempted towards connecting the Philippine islands through the digitalization of the Philippine languages and their respective applications, and what we intend to do in the future. We present the development of a multi-engine bi-directional English-Filipino Machine Translation (MT) system, and the building of various language resources and tools for this ...
متن کامل